Skip to content

<fix>[identity]: propagate ExternalTenantContext via TaskContext for cascade resources#3721

Open
ZStack-Robot wants to merge 1 commit intofeature-zcf-v0.1from
sync/hanyu.liang/fix-1147
Open

<fix>[identity]: propagate ExternalTenantContext via TaskContext for cascade resources#3721
ZStack-Robot wants to merge 1 commit intofeature-zcf-v0.1from
sync/hanyu.liang/fix-1147

Conversation

@ZStack-Robot
Copy link
Copy Markdown
Collaborator

Problem

ExternalTenantResourceTracker fails to track cascade resources (Volume, NIC, CdRom) created during VM creation. Only the top-level VM gets an ExternalTenantResourceRefVO record.

Root cause: VmInstanceBase.handle(InstantiateNewCreatedVmInstanceMsg) uses thdf.chainSubmit() which executes the FlowChain in a thread-pool worker thread. The ThreadLocal ExternalTenantContext is lost during this thread handoff, so when cascade resources are persisted and notifyResourceOwnershipCreated() is called, ExternalTenantContext.getCurrent() returns null.

Fix

Piggyback on ZStack's existing TaskContext propagation infrastructure:

  1. Write side (BeforeDeliveryMessageInterceptor): When restoring ExternalTenantContext to ThreadLocal, also store it in TaskContext via putTaskContextItem().

  2. Read side (notifyResourceOwnershipCreated): If ThreadLocal is null, fallback to TaskContext.getTaskContextItem() which is automatically propagated across thread boundaries by HasThreadContextAspect / SetThreadContextAspect.

This works because:

  • HasThreadContextAspect.aj auto-captures TaskContext snapshot when any HasThreadContext object (ChainTask, Completion, etc.) is constructed
  • SetThreadContextAspect.aj auto-restores TaskContext before ChainTask.run(), Completion.success/fail(), etc.

Changes

  • ExternalTenantResourceTracker.java: ~30 lines added/changed in 1 file

Resolves

ZSTAC-1147

sync from gitlab !9588

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 10, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: http://open.zstack.ai:20001/code-reviews/zstack-cloud.yaml (via .coderabbit.yaml)

Review profile: CHILL

Plan: Pro

Run ID: 60950178-3a9f-416e-9710-1ef72e7dfa9a

📥 Commits

Reviewing files that changed from the base of the PR and between 7d1148a and cb3ba66.

📒 Files selected for processing (2)
  • identity/src/main/java/org/zstack/identity/ExternalTenantResourceTracker.java
  • identity/src/main/java/org/zstack/identity/ExternalTenantZQLExtension.java
🚧 Files skipped from review as they are similar to previous changes (1)
  • identity/src/main/java/org/zstack/identity/ExternalTenantZQLExtension.java

Walkthrough

在 ExternalTenantResourceTracker 中加入基于 TaskContext 的外部租户上下文回退与同步,更新消息发送/接收拦截器以读写 TaskContext;在 ExternalTenantZQLExtension 的子查询中可选加入按 userId 的额外过滤条件。

Changes

Cohort / File(s) Summary
TaskContext 上下文传播与拦截器
identity/src/main/java/org/zstack/identity/ExternalTenantResourceTracker.java
新增常量 TASK_CONTEXT_EXTERNAL_TENANTresolveCurrentContext() 回退逻辑。发送端在 ThreadLocal 缺失时从 TaskContext 恢复上下文,接收端在读取/移除 header 时同时写入/清理 TaskContext;资源所有权通知使用回退后的上下文。
ZQL 子查询 userId 过滤
identity/src/main/java/org/zstack/identity/ExternalTenantZQLExtension.java
在生成的子查询中可选地添加 etref.userId = '<userId>' 过滤:读取 tenantCtx.getUserId() 并在非空且通过验证时将其作为额外 WHERE 子句的一部分,否则不添加该过滤。

Sequence Diagram(s)

sequenceDiagram
    participant Sender as "消息发送方\n(线程)"
    participant TaskCtx as "TaskContext\n(任务范围存储)"
    participant MsgBus as "消息总线\n(拦截器)"
    participant Receiver as "消息接收方\n(线程 / Session)"
    participant DB as "持久层\n(ExternalTenant refs)"

    Sender->>TaskCtx: (可选) put TASK_CONTEXT_EXTERNAL_TENANT
    Sender->>MsgBus: sendMessage(msg)
    MsgBus->>TaskCtx: 若 ThreadLocal 缺失 -> 读取 TASK_CONTEXT_EXTERNAL_TENANT
    MsgBus->>MsgBus: 附加 'external-tenant-context' header(如存在)
    MsgBus->>Receiver: 投递消息
    Receiver->>Receiver: beforeDeliveryMessage: 读取 header 或 session
    Receiver->>TaskCtx: 写入或移除 TASK_CONTEXT_EXTERNAL_TENANT
    Receiver->>Receiver: 设置/清除 ThreadLocal ExternalTenantContext
    Receiver->>DB: notifyResourceOwnershipCreated(读取 TaskContext/ThreadLocal)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 我是小兔跳代码,TaskContext 把线包,
线程断了不慌张,租户上下文会回收,
拦截器里写又读,ZQL 多了 userId,
小兔鼓掌说太妙,追踪稳当又牢靠! 🥕


Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error, 1 warning)

Check name Status Explanation Resolution
Title check ❌ Error 标题不符合指定格式要求。标题长度86字符,超过了72字符的限制。 缩短标题使其不超过72字符。例如:[identity]: propagate ExternalTenantContext via TaskContext
Docstring Coverage ⚠️ Warning Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (1 passed)
Check name Status Explanation
Description check ✅ Passed 描述清晰地说明了问题、根本原因、修复方案及其技术原理,与代码变更高度相关。
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch sync/hanyu.liang/fix-1147

Comment @coderabbitai help to get the list of available commands and usage tips.

@MatheMatrix MatheMatrix force-pushed the sync/hanyu.liang/fix-1147 branch from f039a47 to 7d1148a Compare April 13, 2026 08:04
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@identity/src/main/java/org/zstack/identity/ExternalTenantResourceTracker.java`:
- Around line 114-117: The code assumes session.getExternalTenantContext() is
non-null after session.hasExternalTenant(), but getExternalTenantContext() can
return null; avoid calling TaskContext.putTaskContextItem with a null value.
After retrieving ExternalTenantContext ctx in ExternalTenantResourceTracker,
check if ctx == null and if so execute the existing "clear" behavior (clear
current ExternalTenantContext and remove the TASK_CONTEXT_EXTERNAL_TENANT entry)
instead of calling TaskContext.putTaskContextItem; otherwise proceed to
ExternalTenantContext.setCurrent(ctx) and
TaskContext.putTaskContextItem(TASK_CONTEXT_EXTERNAL_TENANT, ctx).

In `@identity/src/main/java/org/zstack/identity/ExternalTenantZQLExtension.java`:
- Around line 77-78: The code currently throws SkipThisRestrictExprException
when SAFE_TENANT_VALUE.matcher(userId).matches() is false, which incorrectly
skips the entire external-tenant restriction; change this to a hard failure
instead of skipping: in ExternalTenantZQLExtension replace the throw of
SkipThisRestrictExprException for invalid userId with an explicit validation
failure (e.g., throw new IllegalArgumentException or a similar
BadRequest/Validation exception) including a clear message about the bad userId,
or alternatively if you prefer to ignore only the userId clause, remove the
SkipThisRestrictExprException and simply do not add the userId sub-clause while
still retaining the source + tenantId filters so the restriction is not dropped
entirely.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: http://open.zstack.ai:20001/code-reviews/zstack-cloud.yaml (via .coderabbit.yaml)

Review profile: CHILL

Plan: Pro

Run ID: c6eefadd-9332-409d-8e29-a7b5dd488142

📥 Commits

Reviewing files that changed from the base of the PR and between f039a47 and 7d1148a.

📒 Files selected for processing (2)
  • identity/src/main/java/org/zstack/identity/ExternalTenantResourceTracker.java
  • identity/src/main/java/org/zstack/identity/ExternalTenantZQLExtension.java

@ZStack-Robot
Copy link
Copy Markdown
Collaborator Author

Comment from yaohua.wu:

Review: MR !9588 — ZCF-1147

概述:本 MR 修复了两个问题:(1) 级联资源(Volume/NIC/CdRom)在 thdf.chainSubmit() 线程池切换后 ExternalTenantContext ThreadLocal 丢失,导致 ExternalTenantResourceRefVO 未写入;(2) ZQL 查询缺少用户级隔离,同一租户下不同用户可见彼此资源。修复方案利用 TaskContext 跨线程传播机制做 fallback,并在 ExternalTenantZQLExtension 中追加可选的 userId SQL 子句。

整体方案方向正确,TaskContext fallback 的思路与 ZStack 现有的 HasThreadContextAspect/SetThreadContextAspect 机制吻合。但有一个 安全问题 需要在合入前修复。


🔴 Critical — SkipThisRestrictExprException 导致非法 userId 时整个租户过滤被跳过

文件: identity/src/main/java/org/zstack/identity/ExternalTenantZQLExtension.java,新增行 77-78

if (!SAFE_TENANT_VALUE.matcher(userId).matches()) {
    throw new SkipThisRestrictExprException();
}

问题:当 userId 不符合 SAFE_TENANT_VALUE 白名单正则时,抛出 SkipThisRestrictExprException 会导致 整条 ExternalTenantZQLExtension 生成的限制表达式被跳过——即不仅跳过 userId 子句,连 source + tenantId 的过滤也一并消失。

安全风险:攻击者只需构造一个不匹配正则的 userId(如 "a]]; DROP TABLE--"),就能使 ZQL 查询在 无任何租户隔离 的情况下执行,实现跨租户数据泄露(权限提升)。

建议修复

  • 方案 A(推荐):对非法 userId 直接抛出明确的校验异常(如 throw new OperationFailureException(argerr("invalid external tenant userId: %s", userId))),拒绝本次查询。
  • 方案 B:忽略 userId 子句,但仍保留 source + tenantId 过滤——即把 userFilter 设为 "" 而非抛异常。这会退化到租户级隔离,但不会完全无过滤。

🟡 Warning — TaskContext.putTaskContextItem() 可能接收 null 值导致 NPE

文件: identity/src/main/java/org/zstack/identity/ExternalTenantResourceTracker.java,新增行 116-117

ExternalTenantContext ctx = session.getExternalTenantContext();
ExternalTenantContext.setCurrent(ctx);
TaskContext.putTaskContextItem(TASK_CONTEXT_EXTERNAL_TENANT, ctx);

问题hasExternalTenant() 的实现是 externalTenantContext != null && externalTenantContext.getTenantId() != null,但 getExternalTenantContext() 直接返回字段引用。虽然逻辑上在 hasExternalTenant() == true 时不应为 null,但这两个调用非原子操作(同一 SessionInventory 可能被并发修改)。更关键的是:TaskContext 内部使用 ConcurrentHashMap,调用 put(key, null) 会直接抛出 NullPointerException。同一 MR 中 ExternalTenantZQLExtension 已经在 hasExternalTenant() 之后显式检查了 tenantCtx == null,说明作者也意识到了这种不一致。

建议修复:在写入 TaskContext 前加 null 守卫:

if (session != null && session.hasExternalTenant()) {
    ExternalTenantContext ctx = session.getExternalTenantContext();
    if (ctx != null) {
        ExternalTenantContext.setCurrent(ctx);
        TaskContext.putTaskContextItem(TASK_CONTEXT_EXTERNAL_TENANT, ctx);
    } else {
        ExternalTenantContext.clearCurrent();
        TaskContext.removeTaskContextItem(TASK_CONTEXT_EXTERNAL_TENANT);
    }
}

🟡 Warning — TaskContext fallback 逻辑重复,应抽取公共方法

文件: ExternalTenantResourceTracker.java,行 82-88 与行 152-158

以下 fallback 逻辑在 beforeSendMessage()notifyResourceOwnershipCreated() 中完全重复:

if (ctx == null || ctx.getTenantId() == null) {
    Object taskCtx = TaskContext.getTaskContextItem(TASK_CONTEXT_EXTERNAL_TENANT);
    if (taskCtx instanceof ExternalTenantContext) {
        ctx = (ExternalTenantContext) taskCtx;
    }
}

建议抽取为 private static ExternalTenantContext resolveCurrentContext() 方法,减少维护成本,也避免未来新增调用点时遗漏 fallback。


🟢 Suggestion — 消息 header 中 userId 为 null 时的数组行为

文件: ExternalTenantResourceTracker.java,行 92

msg.putHeaderEntry(HEADER_EXTERNAL_TENANT,
    new String[]{ctx.getSource(), ctx.getTenantId(), ctx.getUserId()});

ctx.getUserId() 为 null 时,new String[]{..., null} 会在数组中放入 null 元素。虽然接收端 tenantData.length >= 3 ? tenantData[2] : null 能正确处理,但建议显式判断:仅当 userId != null 时才放入三元素数组,否则沿用旧的两元素格式,保持向后兼容性并避免序列化层面的意外行为。


🟢 Suggestion — 注释完善

TASK_CONTEXT_EXTERNAL_TENANT 常量上方的 Javadoc 注释写得很清晰,建议在 beforeSendMessage()notifyResourceOwnershipCreated() 的 fallback 代码块注释中补充一句:"此处不设置 ThreadLocal,因为当前线程的 ThreadLocal 状态应由 SetThreadContextAspect 在入口处恢复,此处仅读取",帮助后续维护者理解为什么 fallback 只读不写。


总结

严重级别 数量 说明
🔴 Critical 1 SkipThisRestrictExprException 导致租户过滤完全失效
🟡 Warning 2 NPE 风险 + 代码重复
🟢 Suggestion 2 header 数组兼容性 + 注释

结论:🔴 Critical 问题是安全漏洞,建议修复后再合入。其余问题可视情况在后续迭代中处理。


🤖 Robot Reviewer

1. ExternalTenantResourceTracker: add TaskContext fallback in both
   receive-side (notifyResourceOwnershipCreated) and send-side
   (beforeSendMessage) interceptors to recover ExternalTenantContext
   when ThreadLocal is lost across thdf.chainSubmit thread-pool handoff.

2. ExternalTenantResourceTracker: store tenant context in TaskContext
   alongside ThreadLocal in beforeDeliveryMessage interceptor so
   FlowChain workers inherit it via HasThreadContextAspect.

3. ExternalTenantZQLExtension: add optional userId filter to the
   generated SQL subquery for user-level resource isolation within
   a tenant.

Resolves: ZCF-1147

Change-Id: I4dc60e51251fbc97841edd999a53fb7495bd63d1
@MatheMatrix MatheMatrix force-pushed the sync/hanyu.liang/fix-1147 branch from 7d1148a to cb3ba66 Compare April 13, 2026 08:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants